68        Bioinformatics

The POS field specifies the leftmost mapping position in the reference genome of the

first base of the aligned read. For the sequence “SRR062634.6862698”, the alignment posi-

tion is 10001 as shown in Figure 2.15. The POS value of the unmapped read is “0”.

The MAPQ field specifies the mapping quality in Phred scoring unit. If the quality score

is not available, the value will be set to 255.

The CIGAR field contains the CIGAR string, which is a sequence of the base length and

the associated operations. It is used to indicate where, in the aligned read, an operation

like match/mismatch, deletion, or insertion took place. Table 2.4 lists the CIGAR opera-

tions that can be associated with a read alignment. For instance, in the SAM file shown in

Figure 2.15, the sequence “SRR062634.6862698” has the CIGAR string “30M70S”, which

means that the first 30 bases of that sequence match (30M) bases in the reference sequence

and 70 bases showing soft clip (70S), which is special mismatch due to error in the reference

sequence.

The RNEXT field contains the reference sequence name of the primary alignment of the

NEXT read in the template. If it is not present, the field will be set to “=” or “*”. The PNEXT

field specifies the position of the primary alignment of the NEXT read in the template. This

field will be set to “0” when the information is unavailable.

The TLEN field specifies the signed observed template length. This field will be positive

for the leftmost read, negative for the rightmost, and undefined for any middle read.

The SEQ field contains the read sequence. This field will be set to “*” if the sequence

was not stored. Otherwise, the sequence must be equal to the sum of lengths of the opera-

tions in CIGAR string. The “=” symbol indicates that the base of the read is identical to

the reference base. The mapped reads are aligned to the forward genomic strand (plus

strand). Therefore, if a read is mapped to the reverse strand (minus strand) of the refer-

ence sequence, the SEQ field will contain the complementary strand of the unmapped plus

strand.

The QUAL field contains the Phred quality scores in ASCII character (Phred+33). The

string in this field is the same as the string in the FASTQ file. This field may be set to “*” if

FIGURE 2.16  Using “samtools flags” to convert numeric FLAG representations into textual.